• Àüü
  • ÀüÀÚ/Àü±â
  • Åë½Å
  • ÄÄÇ»ÅÍ
´Ý±â

»çÀÌÆ®¸Ê

Loading..

Please wait....

±¹³» ³í¹®Áö

Ȩ Ȩ > ¿¬±¸¹®Çå > ±¹³» ³í¹®Áö > Çѱ¹Á¤º¸Ã³¸®ÇÐȸ ³í¹®Áö > Á¤º¸Ã³¸®ÇÐȸ ³í¹®Áö ¼ÒÇÁÆ®¿þ¾î ¹× µ¥ÀÌÅÍ °øÇÐ

Á¤º¸Ã³¸®ÇÐȸ ³í¹®Áö ¼ÒÇÁÆ®¿þ¾î ¹× µ¥ÀÌÅÍ °øÇÐ

Current Result Document :

ÇѱÛÁ¦¸ñ(Korean Title) GANÀ¸·Î ÇÕ¼ºÇÑ À½¼ºÀÇ Ãæ½Çµµ Çâ»ó
¿µ¹®Á¦¸ñ(English Title) Improving Fidelity of Synthesized Voices Generated by Using GANs
ÀúÀÚ(Author) ¹é¹®±â   À±½Â¿ø   ÀÌ»ó¹é   À̱Ôö   Moon-Ki Back   Seung-Won Yoon   Sang-Baek Lee   Kyu-Chul Lee  
¿ø¹®¼ö·Ïó(Citation) VOL 10 NO. 01 PP. 0009 ~ 0018 (2021. 01)
Çѱ۳»¿ë
(Korean Abstract)
»ý¼ºÀû Àû´ë ½Å°æ¸Á(Generative Adversarial Networks, GANs)Àº ÄÄÇ»ÅÍ ºñÀü ºÐ¾ß¿Í °ü·Ã ºÐ¾ß¿¡¼­ Å« Àα⸦ ¾ò¾úÀ¸³ª, ¾ÆÁ÷±îÁö´Â ¿Àµð¿À ½ÅÈ£¸¦ Á÷Á¢ÀûÀ¸·Î »ý¼ºÇÏ´Â GANÀÌ Á¦½ÃµÇÁö ¸øÇß´Ù. ¿Àµð¿À ½ÅÈ£´Â À̹ÌÁö¿Í ´Ù¸£°Ô ÀÌ»ê °ªÀ¸·Î ±¸¼ºµÈ »ýÇøµµÈ ½ÅÈ£À̹ǷÎ, À̹ÌÁö »ý¼º¿¡ ³Î¸® »ç¿ëµÇ´Â CNN ±¸Á¶·Î ÇнÀÇϱ⠾î·Æ´Ù. ÀÌ·¯ÇÑ Á¦¾àÀ» ÇØ°áÇÏ°íÀÚ, ÃÖ±Ù GAN ¿¬±¸ÀÚµéÀº ¿Àµð¿À ½ÅÈ£ÀÇ ½Ã°£-ÁÖÆļö Ç¥ÇöÀ» ±âÁ¸ À̹ÌÁö »ý¼º GAN¿¡ Àû¿ëÇÏ´Â Àü·«À» Á¦¾ÈÇß´Ù. º» ³í¹®Àº ÀÌ Àü·«À» µû¸£¸é¼­ GANÀ» »ç¿ëÇØ »ý¼ºµÈ ¿Àµð¿À ½ÅÈ£ÀÇ Ãæ½Çµµ¸¦ ³ôÀ̱â À§ÇÑ °³¼±µÈ ¹æ¹ýÀ» Á¦¾ÈÇÑ´Ù. º» ¹æ¹ýÀº °ø°³µÈ ½ºÇÇÄ¡ µ¥ÀÌÅͼ¼Æ®¸¦ »ç¿ëÇØ °ËÁõÇßÀ¸¸ç, ÇÁ·¹Ã ÀμÁ¼Ç °Å¸®(Fréchet Inception Distance, FID)¸¦ »ç¿ëÇØ Æò°¡Çß´Ù. ±âÁ¸ÀÇ ÃÖ½Å(state-of-the-art) ¹æ¹ýÀº 11.973ÀÇ FID¸¦, º» ¿¬±¸¿¡¼­ Á¦¾ÈÇÏ´Â ¹æ¹ýÀº 10.504ÀÇ FID¸¦ º¸¿´´Ù(FID°¡ ³·À»¼ö·Ï Ãæ½Çµµ´Â ³ô´Ù).
¿µ¹®³»¿ë
(English Abstract)
Although Generative Adversarial Networks (GANs) have gained great popularity in computer vision and related fields, generating audio signals independently has yet to be presented. Unlike images, an audio signal is a sampled signal consisting of discrete samples, so it is not easy to learn the signals using CNN architectures, which is widely used in image generation tasks. In order to overcome this difficulty, GAN researchers proposed a strategy of applying time-frequency representations of audio to existing image-generating GANs. Following this strategy, we propose an improved method for increasing the fidelity of synthesized audio signals generated by using GANs. Our method is demonstrated on a public speech dataset, and evaluated by Fréchet Inception Distance (FID). When employing our method, the FID showed 10.504, but 11.973 as for the existing state of the art method (lower FID indicates better fidelity).
Å°¿öµå(Keyword) »ý¼ºÀû Àû´ë ½Å°æ¸Á   ÇÁ·¹Ã ÀμÁ¼Ç °Å¸®   Ãæ½Çµµ °³¼±   ÇÕ¼ºµÈ À½¼º   Generative Adversarial Networks   Fréchet Inception Distance   Fidelity Improvement   Synthesized Voice  
ÆÄÀÏ÷ºÎ PDF ´Ù¿î·Îµå